Skip to content

Value set: lift offset from numeric constants to expressions #8647

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: develop
Choose a base branch
from

Conversation

tautschnig
Copy link
Collaborator

We can safely track arbitrary expressions as pointer offsets rather than limit ourselves to just constant offsets (and then treating all other expressions as "unknown").

  • Each commit message has a non-empty body, explaining why the change was made.
  • n/a Methods or procedures I have added are documented, following the guidelines provided in CODING_STANDARD.md.
  • n/a The feature or user visible behaviour I have added or modified has been documented in the User Guide in doc/cprover-manual/
  • Regression or unit tests are included, or existing tests cover the modified code (in this case I have detailed which ones those are in the commit message).
  • n/a My commit message includes data points confirming performance improvements (if claimed).
  • My PR is restricted to a single feature or bugfix.
  • n/a White-space or formatting changes outside the feature-related changed lines are in commits of their own.

Copy link

codecov bot commented Jun 3, 2025

Codecov Report

Attention: Patch coverage is 73.68421% with 15 lines in your changes missing coverage. Please review.

Project coverage is 80.36%. Comparing base (eef9677) to head (96844b5).

Files with missing lines Patch % Lines
src/pointer-analysis/value_set.cpp 72.72% 15 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #8647      +/-   ##
===========================================
- Coverage    80.36%   80.36%   -0.01%     
===========================================
  Files         1688     1688              
  Lines       207067   207073       +6     
  Branches        73       73              
===========================================
- Hits        166418   166414       -4     
- Misses       40649    40659      +10     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@tautschnig tautschnig assigned kroening and unassigned tautschnig Jun 3, 2025
@@ -981,7 +981,7 @@ normalize(const object_descriptor_exprt &expr, const namespacet &ns)
{
return expr;
}
if(expr.offset().id() == ID_unknown)
if(!expr.offset().is_constant())
Copy link
Collaborator

@remi-delmas-3000 remi-delmas-3000 Jun 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we get a high level description of what is the normal form we're trying to reach ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To root object + constant offset, as object can be an arbitrary access path into the object. Pointer equality checks become trivial then - maybe simplify_expr has become good enough in the meanwhile.
Anyways, this seems orthogonal to this PR.

@@ -184,7 +183,7 @@ void value_sett::output(std::ostream &out, const std::string &indent) const
stream << "<" << format(o) << ", ";

if(o_it->second)
stream << *o_it->second;
stream << format(*o_it->second);
Copy link
Collaborator

@remi-delmas-3000 remi-delmas-3000 Jun 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

now we have to print an expression instead of a mere integer

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but why is that a concern?

Copy link
Collaborator

@remi-delmas-3000 remi-delmas-3000 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have one remaining question: Now that we have symbolic offsets for pointer expressions instead of just "constants" or "unknown", is there ay way to use that to compute more precise results in get_value_set_rec ? I know it lets us be more precise when modelling assignments, but I don't understand why we don't have a similar gain in precision when computing dereferences/traversing value_sets.

Other question: now that the value set representation "knows" that an expression array[i] has an offset of the form i * sizeof(T) could we try to take into account extra constraints about i during the value set traversal ? Let's say we're trying to resolve array[i] in the context of a basic loop invariant 0 <= i && i <= len(array), knowing range constraints about i we could maybe avoid injecting values representing OOB accesses in the value set for array[i] ?

Copy link
Member

@peterschrammel peterschrammel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems we are lacking tests in 4 places here. Given that this is all but trivial it would be great to find some test cases that trigger these.

@@ -981,7 +981,7 @@ normalize(const object_descriptor_exprt &expr, const namespacet &ns)
{
return expr;
}
if(expr.offset().id() == ID_unknown)
if(!expr.offset().is_constant())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To root object + constant offset, as object can be an arbitrary access path into the object. Pointer equality checks become trivial then - maybe simplify_expr has become good enough in the meanwhile.
Anyways, this seems orthogonal to this PR.

@@ -362,7 +361,8 @@ bool value_sett::eval_pointer_offset(
if(!ptr_offset.has_value())
return false;

*ptr_offset += *it->second;
*ptr_offset +=
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Codecov says that 361-368 are not covered by any tests.

if(!i.has_value())
i = mp_integer{0};
i = *i + *offset;
additional_offset = plus_exprt{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Codecov says that 731-733 are not covered by any tests.

}
else
{
*i *= *size;
additional_offset = mult_exprt{
*additional_offset, from_integer(*size, additional_offset->type())};

if(expr.id()==ID_minus)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Codecov says that 758-766 are not covered by any tests.

{
auto size = pointer_offset_size(array_type.element_type(), ns);

if(!size.has_value() || *size == 0)
o.reset();
else
*o = *i * (*size);
{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Codecov says that 1416-1431 are not covered by any tests.

@tautschnig
Copy link
Collaborator Author

I have one remaining question: Now that we have symbolic offsets for pointer expressions instead of just "constants" or "unknown", is there ay way to use that to compute more precise results in get_value_set_rec ? I know it lets us be more precise when modelling assignments, but I don't understand why we don't have a similar gain in precision when computing dereferences/traversing value_sets.

I have already seen this happen, although it isn't necessarily very obvious unless one starts examining the formula that symex produces. #8653 is a consequence of my observations: I was surprised to still find "unknown" when I had expected a known offset

Other question: now that the value set representation "knows" that an expression array[i] has an offset of the form i * sizeof(T) could we try to take into account extra constraints about i during the value set traversal ? Let's say we're trying to resolve array[i] in the context of a basic loop invariant 0 <= i && i <= len(array), knowing range constraints about i we could maybe avoid injecting values representing OOB accesses in the value set for array[i] ?

I'm not sure we even do create those OOB values here?

We can safely track arbitrary expressions as pointer offsets rather than
limit ourselves to just constant offsets (and then treating all other
expressions as "unknown").
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants